A Compression Technique for Arabic Dictionaries: The Affix Analysis

نویسنده

  • Abdelmajid Ben Hamadou
چکیده

In every application that concerns the automatic processing of natural language, the problem of the dictionary size is posed. In this paper , we propose a compression dictionary al~orithm based on an affix analysis of the non diacritical Arabic. It consists in decomposing a word into its first elements taking into account the different linguistic transformations that can affect the morphological structures. This work has been achieved as part of a study of the automatic detection and correction of spelling errors in the non diacritical Arabic texts. IINTRODUCTION In every application that concerns the automatic processing of natural language, the problem of the dictionary size is posed. We can approach this important question in several ways and particularly : By grouping together the common prefixes of the different language words. In the PIAF system,(interactive program for French Analysis) for instance, words are represented in chained lists following an alphabetical order [COUR 77] EX : PARTIEL ~ PARTIES_____--~_PARTOUT ...

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Morphological Analysis and Diacritical Arabic Text Compression

Morphological analysis of Arabic words allows decreasing the storage requirements of the Arabic dictionaries, more efficient encoding of diacritical Arabic text, faster spelling and efficient Optical character recognition. All these factors allow efficient storage and archival of multilingual digital libraries that include Arabic texts. This paper presents a lossless compression algorithm based...

متن کامل

Genetic Algorithms in Syllable-Based Text Compression

Syllable based text compression is a new approach to compression by symbols. In this concept syllables are used as the compression symbols instead of the more common characters or words. This new technique has proven itself worthy especially on short to middle-length text files. The effectiveness of the compression is greatly affected by the quality of dictionaries of syllables characteristic f...

متن کامل

“Uncommon terminations”: Proscription and morphological productivity

Discussions of the standardization of English vocabulary are seldom taken up with questions of morphology. Yet there is a history of, often strikingly similar, attempts to influence the use of particular word-formation processes, such as the proscription of individual lexical items on morphological grounds, or more precisely, the grounds that an affix is being “overextended”. This is not a refe...

متن کامل

Hermit Crabs: Formal Renewal of Morphology by Phonologically Mediated Affix Substitution

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].. Linguistic Society of America is collaborating with JSTOR...

متن کامل

Logic Compression Of Dictionaries For Multilingual Spelling Checkers

To provide practical spelling checkers on micro-computers, good compression algorithms ,'~'c essenlial. CutTeut techniques used to compress lexicons for indo-Fmropean languages provide efficient spelling checker. Applying the .~une methods to languages which have a different morphological system (Arabic, Turkish,...) gives insufficient resuits. To get better results, we apply other "logical" co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1986